
Remarkably, we find that this causality condition provides a natural framework to parameterize the cost function that is learned by the discriminator as arobust (worst-case) distance, and anideal mechanism for learning time dependent data distributions.